Introduction

The EU Horizon SURIMI project aims to deliver a suite of ready-to-use socio-economic and ecological simulation models that are integrated into the EU Digital Twin Ocean. These models leverage a wide range of EU fishery activity data, including datasets from the Fisheries Data Collection Framework (DCF), to support evidence-based policy and sustainable management of marine resources. However, most data are collected in an aggregated form, which limits their spatial resolution and applicability to the fine-scale assessment and simulation of vessel behaviour. For these reasons, one of the objectives of the SURIMI project is to create a set of algorithms to disaggregate the socio-economic dataset at a finer spatial and technical level or even at the level of individual fishing vessels, using public data and online resources (DCF and Global Fishing Watch). 

Summary of the procedure

This procedure (Fig XX) outlines a method to estimate fisheries economic indicators at the level of individual vessels and ICES cells by integrating multiple data sources through a disaggregation methodology. Aggregated data from the EU’s FDI—including effort and landing data —is provided at the ICES cell level and by metier, while the AER database, including economic indicators, is grouped by supra-region, country and fleet segment. Global Fishing Watch (GFW) data is used to provide total fishing hours per vessel and harbour of landings.

Firstly, the FDI data were matched to the Annual Economic Report of Fisheries (AER) according to vessel length, in order to extract the relevant economic indicators. For this example, the focus was on the Gross Value of Landings (GVL); however, any information contained in the AER can be extracted.

In parallel, Global Fishing Watch (GFW) data was used to provide information on the effort (hours of fishing) of each individual vessel and their associated harbours for the selected case study. The MMSI identifier was used to match GFW individual vessels with the EU Fleet Register to assign vessel lengths. Finally, individual vessels were grouped by ICES cell and length to align with the FDI dataset.

Note: Here, we use public GFW data to demonstrate a procedure in cases where access to other types of data is not possible. However, the GFW dataset is restricted to a few fisheries as it only shows those with AIS coverage. However, if more detailed data are available—such as those obtained from the Vessel Monitoring System (VMS)—the procedure can be reliably replicated using that dataset as well.

The FDI-AER and GFW datasets were merged using common identifiers (ICES cell and vessel length). The resulting combined dataset allowed disaggregation of FDI-AER data by individual vessel, computed by dividing total values by the number of vessels in each vessel length and ICES cell. This enabled spatially explicit estimates of GVL at the vessel level.

Procedure scheme
Procedure scheme

1 Protocol - FDI effort by ices cell

In this session, the methodology for disaggregating data will be systematically explained through the application in a specific area that is the GSA09 (Northern Tyrrhenian Sea).

Firstly, to ensure proper data management, it is necessary to download and save the data in a folder specifically dedicated to this purpose.

The data to be downloaded includes:

Data Input
Data Description
FDI effort and landing Effort and landing data divided by year, country, GSA, gear, vessel length and ices-cell. The geographical reference, expressed as ices-cell longitude-latitude coordinates, is provided in the shapefiles.
AER Economic indicators by year, country, GSA, gear, and vessel length.
Fleet register Descriptive information on individual vessels: vessel name, MMSI identifier, vessel length, port of registration, tonnage, power, gear, etc.
EMODNET main ports for the European Seas Main ports’ locations data from 1997 to 2024
FAO ASFIS List of Species for Fisheries The ASFIS (Aquatic Sciences and Fisheries Information System) list for fishery statistics represents the standard taxonomic reference system for the FAO Statistics Team.
FAO Geographical Sub-Areas FAO GFCM area of application, comprised of the Mediterranean and the Black Sea, as Major Fishing Area 37.

Save the data to a folder and set the folder as the data location in the R environment:

wd = "SET YOUR DATA FOLDER DIRECTORY"
library(curl)
library(dplyr)
library(doBy)
library(ggplot2)
library(ggrepel)
library(ggridges)
library(ggspatial)
library(gfwr)
library(gridExtra)
library(gtsummary)
library(leaflet)
library(openxlsx)
library(patchwork)
library(RColorBrewer)
library(reshape2)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
library(tidyverse)
library(tidytext)
library(terra)
library(VennDiagram)
library(webr)
library(webshot2)

Data manipulation for a case study area

Users could establish parameters for their case study, which will subsequently inform the procedure.

Here we test Italian Bottom Otter Trawlers (ITA-OTB) in 2021 for GSA09.

CS_name = "FAO GSA09 - Western Med"
Gear_CS = "OTB"
Year_CS = "2021"
Country_CS = "ITA"
Country_code = "IT"
GSAs_CS = "GSA09"
GSAa_CS = "GSA9"

Step 1 - Open and subset FDI data Effort

Users could establish parameters for their case study, which will subsequently inform the procedure. Here we test Italian Bottom Otter Trawlers (ITA-OTB) in 2021 for GSA09. Firstly, data were filtered for the selected case study. Results for FDI effort data and AER economic data were shown.

effort = read.csv(paste0(wd,"AER and FDI datasets (October 24)/FDI_spatial_data_EU28/EU28/spatial_effort_tableau_pts_EU28.csv")) 
effort = effort %>%  
         filter(year %in% Year_CS & gear_type %in% Gear_CS & cscode != "") %>%  
         mutate(totfishdays = as.numeric(totfishdays))

Subsequently, spatial ICES cells were used to map the total effort and landing data.

spatial_effort = read_sf(paste0(wd,"AER and FDI datasets (October 24)/FDI_spatial_data_EU28/EU28/effort_csquares.shp"))
spatial_effort = spatial_effort %>% 
                 filter(cscode != is.na(cscode) & cscode != "")

#Join data 

effort_sf = st_as_sf(left_join(effort, spatial_effort, by = "cscode"))

Then the GSA polygon was used to subset and plot the landing and effort on the map.

# Effort and landing by GSA

GSA = read_sf(paste0(wd,"GSAs_simplified.shp")) %>%
       filter(SECT_COD == GSAs_CS)

effort_GSA = effort_sf %>% 
              filter(sub_region == GSAa_CS)
            
effort_GSA = st_intersection(effort_GSA, GSA)


effort_sf = effort_GSA

CS = GSA

#Set parameter for the map

world <- ne_countries(scale = "medium", returnclass = "sf", continent = "europe")
world = st_transform(world, crs = st_crs(CS))

xmin = as.numeric(st_bbox(effort_sf)[1])-0.1
xmax = as.numeric(st_bbox(effort_sf)[3])+0.1
ymin = as.numeric(st_bbox(effort_sf)[2])-0.1
ymax = as.numeric(st_bbox(effort_sf)[4])+0.1

Total effort coverage for the case study area - resulting from FDI data

eff = ggplot()+
  geom_sf(data = effort_sf, aes(fill = log(totfishdays)), color = NA)+
  scale_fill_viridis_c(option = "A", na.value = "white")+ 
  geom_sf(data = world)+
  coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax))+
  annotation_scale(location = "tl", width_hint = 0.5) +
  annotation_north_arrow(location = "tl", which_north = "true", 
                         pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"),
                         style = north_arrow_fancy_orienteering) +
  ggtitle(paste0("FDI Effort_",CS_name,"_",Gear_CS,"_",Year_CS))+
  theme_light()+
  theme(legend.position = "bottom")

print(eff)

Save all the filtered data in a specific folder fd = “CaseStudy/Data/”

fd = "CaseStudy/Data/"

write_sf(effort_sf, paste0(fd,"FDI_effort_CS.shp"))

write.csv(
  st_drop_geometry(effort_sf) %>%
    rename(id = cscode, gear = gear_type, vlength = vessel_length, tot_fish_day = totfishdays), paste0(fd,"FDI_effort_CS.csv"), row.names = F)

remove(effort)

Descriptive analysis Effort

FDI_effort_CS = read.csv(paste0(fd,"FDI_effort_CS.csv")) %>% mutate(quarter = as.character(quarter))

ggplot()+
  geom_density_ridges(data = FDI_effort_CS, aes(y = fct_reorder(vlength,tot_fish_day), x = tot_fish_day, fill = vlength),alpha = 0.5)+
  theme_minimal()+
   scale_fill_brewer(palette = "Set1")+

  ylab("Vessel Length")+
  xlab("Total fishing days")+
  ggtitle(paste0("FDI Effort_",CS_name,"_",Gear_CS,"_",Year_CS, "\nby vessel length"))

FDI_CS_data = FDI_effort_CS

Step 2 - Open and subset AER data

AER_FS = read.xlsx(paste0(wd,"STECF_24-07_EU Fleet Economic and Transversal data/STECF 24-07 - EU Fleet Economic and Transversal data_fleet segment level.xlsx"), sheet = 2) %>% 
           filter(year %in% Year_CS & fishing_tech %in% c("DTS") & country_code %in% Country_CS) 

write.csv(AER_FS, paste0(fd,"Economic_data.csv"), row.names = F)

Descriptive analysis AER data

AER_CS = read.csv(paste0(fd,"Economic_data.csv"))

AER_CS = AER_CS %>%  
          rename(vlength = vessel_length) %>% 
          select(c(country_code, year, fishing_tech, vlength, variable_group, variable_name, variable_code, value, unit ))

data_sum = AER_CS %>% 
            group_by(country_code, fishing_tech,vlength, variable_group, variable_name, unit) %>% 
            summarise(val = sum(value,na.rm = T)) %>% 
            rename(gear = fishing_tech)

 data_sum %>% 
   group_by(variable_group, variable_name) %>% 
   mutate(val_prop = val/sum(val)*100) %>% 
   ggplot()+
   geom_bar(aes(y = variable_name, x= val_prop, fill = vlength), stat = "identity")+
   facet_wrap(~ variable_group, scales = "free")+ 
   scale_fill_brewer(palette = "Set1")+
   xlab("")+
   ylab("")+
   facet_wrap(~ gear)+
   theme_classic()

 AER_wide = AER_CS %>% 
            select(-c(variable_code, unit, variable_group)) %>% 
            dcast(...~ variable_name, value.var = "value")

 write.csv(AER_wide, paste0(fd,"Economic_data_wide.csv"), row.names = F)

Step 3 - Join AER with - FDI Effort-Landing

FDI_CS_data = read.csv(paste0(fd,"FDI_effort_CS.csv")) %>% mutate(quarter = as.character(quarter))
AER_CS = read.csv(paste0(fd,"Economic_data.csv")) %>% rename(vlength = vessel_length)


FDI_sub = unique(FDI_CS_data[,c("year","quarter", "vlength", "id", "tot_fish_day" )])
AER_sub = AER_wide[,c("year", "vlength", "Fishing days", "Days at sea")]

Step 4 - Find vessels track by Global Fishing Watch (GFW)

Extrapolate data

In this step, we will identify all vessels present in the CS area in a defined moment (here, we use the year 2021 as an example). The vessels were extrapolated from the GFW dataset, which uses AIS data to identify vessel tracks, fishing areas, and zones of navigation. Furthermore, it has the capacity to identify the ports visited by individual vessels. For more datails see https://globalfishingwatch.org/our-apis/

The use of gfwr requires a GFW API token, which users can request from the GFW API Portal. Save this token to your .Renviron file using usethis::edit_r_environ() and adding a variable named GFW_TOKEN to the file (GFW_TOKEN=“PASTE_YOUR_TOKEN_HERE”). Save the .Renviron file and restart the R session to make the edit effective.

gfwr functions are set to use key = gfw_auth() by default so in general you shouldn’t need to refer to the key in your function calls.

key = gfw_auth() 
CS_polygon <- sf::st_bbox(c(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
  crs = 4326) |>
  sf::st_as_sfc() |>
  sf::st_as_sf()

GFW_effort = get_raster(spatial_resolution = 'LOW',
                 temporal_resolution = 'MONTHLY',
                 group_by = 'VESSEL_ID',
                 start_date = "2021-01-01",
                 end_date = "2021-12-31",
                 region = CS_polygon,
                 region_source = 'USER_SHAPEFILE',
                 key = key)

colnames(GFW_effort) = make.names(colnames(GFW_effort))

GFW_effort %>% 
  group_by(Flag,Gear.Type) %>% 
  summarise(h = sum(Apparent.Fishing.Hours)) %>% 
  ggplot()+
  geom_bar(aes(x = h, y = reorder(Gear.Type, h), fill = Flag), stat = "identity")+
  ggtitle("GFW data from CS polygon")+
  xlab("Fishing hours")+
  ylab("Gear type")+
  theme_light()

write.csv(GFW_effort, paste0(fd,"GFW_effort_tot_CS.csv"))

Subset by Country, Gear, and CS Area

FDI_effort_CS_sf = read_sf(paste0(fd,"FDI_effort_CS.shp"))
GFW_effort = read.csv(paste0(fd,"GFW_effort_tot_CS.csv"))

GFW_effort_CS_sf = GFW_effort %>%
                     filter(Flag == "ITA" & Gear.Type %in% c("TRAWLERS")) %>% 
                     st_as_sf(coords = c("Lon", "Lat"), crs = 4326) 

GFW_effort_CS_sf$month = as.integer(substr(GFW_effort_CS_sf$Time.Range, 6,7))
GFW_effort_CS_sf$quarter = as.character(c(1,2,3,4)[findInterval(GFW_effort_CS_sf$month, c(1,3,6,9,13))])

Performs spatial joins quarter-by-quarter to ensure temporal alignment

FDI_effort_CS_sf_by_quarter = FDI_effort_CS_sf %>%
                      group_by(quarter,cscode, ger_typ) %>%
                      summarise(FDI_tot_fish_day_by_ICES = sum(ttfshdy))

quarter = c("1","2","3","4")

GFW_effort_CS_sf_grid <- NULL  
for(i in 1:length(quarter)) {
  a <- st_join(
    GFW_effort_CS_sf[which(GFW_effort_CS_sf$quarter %in% quarter[i]), ],
    FDI_effort_CS_sf_by_quarter[which(FDI_effort_CS_sf_by_quarter$quarter %in% quarter[i]), "cscode"],
    left = T
  )

  GFW_effort_CS_sf_grid <- rbind(GFW_effort_CS_sf_grid, a)
}
  
GFW_effort_CS_sf_grid = GFW_effort_CS_sf_grid %>% 
                        filter(!is.na(cscode)) %>% 
                        rename(id = cscode) 


write_sf(GFW_effort_CS_sf_grid, paste0(fd,"GFW_effort_CS_sf_grid.shp"))
write.csv(
  st_drop_geometry(GFW_effort_CS_sf_grid),paste0(fd,"GFW_effort_CS_sf_grid.csv"), row.names = F)

Step 5 - Find port visited by Global Fishing Watch (GFW)

After obtaining vessels within a specified area for a particular type of gear in a given year, the ports visited by these vessels are also downloaded. This process is undertaken in accordance with a methodology that has been described and validated by GFW. This provides a comprehensive overview of the port’s operational dynamics, including the place of landing and the number of vessels arriving at each port.

Please note that this phase can take a long time.

vID = unique(GFW_effort_CS_sf_grid$Vessel.ID)



# Initialize port_FV with correct column types
port_FV <- data.frame(
  port = character(), 
  lat = numeric(), 
  lon = numeric(), 
  vessel_name = character(),
  MMSI = character(),
  month = character(),
  stringsAsFactors = FALSE
)

for (i in 1:length(vID)) {
  
  port_event <- get_event(
    event_type   = 'PORT_VISIT',
    start_date   = "2021-01-01",
    end_date     = "2021-12-31",
    region       = CS_polygon,
    vessels      = vID[i],
    region_source= 'USER_SHAPEFILE',
    key          = key
  )
  
  if (is.null(port_event)) next
  
  for (j in 1:nrow(port_event)) {
    
    # Extract values safely, replacing NULL with NA
    port_name <- port_event$event_info[[j]]$startAnchorage$name
    lat <- port_event$event_info[[j]]$startAnchorage$lat
    lon <- port_event$event_info[[j]]$startAnchorage$lon
    vessel_name <- port_event$vessel_name
    MMSI <- port_event$vessel_ssvid
    month <- as.character(month(port_event$start))
    

    # Create the data frame with NULL-safe values
    port_event_df <- data.frame(
      port = ifelse(length(port_name) == 0, NA, port_name),
      lat = ifelse(length(lat) == 0, NA, lat),
      lon = ifelse(length(lon) == 0, NA, lon),
      vessel_name = ifelse(length(vessel_name) == 0, NA, vessel_name),
      MMSI = ifelse(length(MMSI) == 0, NA, MMSI),
      month = ifelse(length(month) == 0, NA, month),

      stringsAsFactors = FALSE
    )
    
    # Append the row to the result dataframe
    port_FV <- bind_rows(port_FV, port_event_df)
  }
}

# Remove duplicates and drop rows with NA values
port_CS_OTB <- port_FV %>% 
  unique() %>% 
  drop_na() %>% 
  mutate(quarter = case_when(
          month %in% c("1", "2", "3") ~ "1",
          month %in% c("4", "5", "6") ~ "2",
          month %in% c("7", "8", "9") ~ "3",
          month %in% c("10", "11", "12") ~ "4"  )) %>% 
  group_by(port, vessel_name, MMSI, quarter) %>% 
  summarise(lat = mean(lat), lon = mean(lon))



write.xlsx(port_CS_OTB, paste0(fd,"GFW_port_CS.xlsx"))

Descriptive analysis

GFW_port_CS = read.xlsx(paste0(fd,"GFW_port_CS.xlsx"))

GFW_port_CS %>% 
  group_by(port) %>%
  summarise(
    lon = mean(lon, na.rm = TRUE),
    lat = mean(lat, na.rm = TRUE),
    nvessel = n()) %>% 
 
  ggplot()+
  geom_bar(aes(y = reorder(port,nvessel) , x = nvessel), stat = "identity")+
  theme_light()+
  ggtitle("Harbour by number of vessels")+
  xlab("Proportion of number of vessels")+
  ylab("")

Open Fleet Register and Add Vessel length (LOA) by MMSI - Vessel name for GFW data

At this stage of the process, the port of landing has been established for each vessel. However, vessel length is still needed to complete the FDI data linkage. It should be noted that data on the length of each vessel can be extracted from the fleet register. To this end, it is recommended that the information on the vessel’s length be identified in the fleet register and joined with the MMSI reported in the GFW data. –> Unfortunately, we lost ~ 50 vessels because they do not have an MMSI associated with the fleet register

fleetReg = read.csv(paste0(wd,"vesselRegistryListResults.csv"), sep = ";")

fleetReg[fleetReg$Main.fishing.gear %in% c("TBN","OTS", "TBS", "OT"), "Main.fishing.gear"] <- "OTB"
fleetReg[fleetReg$Main.fishing.gear %in% c("SV","SX"), "Main.fishing.gear"] <- "SDN"
fleetReg[fleetReg$Main.fishing.gear %in% c("DRM", "DRH") , "Main.fishing.gear"] <- "DRB"
fleetReg[fleetReg$Main.fishing.gear %in% c("GTN","GNC", "GN", "FIX"), "Main.fishing.gear"] <- "GNS"
fleetReg[fleetReg$Main.fishing.gear %in% "SPR", "Main.fishing.gear"] <- "SSC"
fleetReg[fleetReg$Main.fishing.gear %in% c("SB", "NK"), "Main.fishing.gear"] <- "MIS"
fleetReg[fleetReg$Main.fishing.gear %in% c("LL", "LX"), "Main.fishing.gear"] <- "LLS"


fleetReg_info = fleetReg %>% 
                 rename(vessel_name = "Name.of.vessel", Gear = "Main.fishing.gear", Country ="Country.of.Registration") %>% 
                  mutate(MMSI = as.character(MMSI)) %>% 
                  filter(Country %in% Country_CS) %>% 
                  filter(Gear %in% Gear_CS) 


fleetReg_info$vlength = c("VL0006","VL0612","VL1218", "VL1824", "VL2440", "VL40XX" )[findInterval(fleetReg_info$LOA, c(0,06,12,18,24,40, 100))]

write.csv(fleetReg_info, paste0(fd,"FleetReg_info_CS.csv"), row.names = F)
fleetReg_info = read.csv(paste0(fd,"FleetReg_info_CS.csv")) %>% 
                select(vessel_name, MMSI, Gear, vlength) %>% 
                mutate(MMSI = as.character(MMSI)) %>% 
                rename(name_vreg = vessel_name)

GFW_port_CS_fReg = GFW_port_CS %>% 
                  left_join(fleetReg_info, by = "MMSI") %>% 
                  filter(MMSI %in% fleetReg_info$MMSI)
                    

GFW_effort_CS_sf_grid_fReg = read.csv(paste0(fd,"GFW_effort_CS_sf_grid.csv")) %>% 
                              mutate(MMSI = as.character(MMSI)) %>% 
                              left_join(fleetReg_info, by = "MMSI") %>% 
                              filter(MMSI %in% unique(fleetReg_info$MMSI))


write.csv(GFW_effort_CS_sf_grid_fReg, paste0(fd,"GFW_effort_CS_sf_grid_fReg.csv"))
write.csv(GFW_port_CS_fReg, paste0(fd,"GFW_port_CS_fReg.csv"), row.names = F)

Check for main ports

Subsequently, the landing ports are cleaned, keeping only the main ones. We filtered the main ports resulting from the EMODNET Main Ports database (Vessels Traffic by Type 1997-2024). Since the two datasets are not perfectly comparable, we first identify all the GFW ports that are also present in the EMODNET dataset by performing a joint search on the port name. Then, a buffer of 3 km is created around the EMODNET ports, and the GFW ports within that buffer are assigned the same name as the EMODNET ports.

During this phase, it is essential to consult a Case Study expert who will be able to manually modify the port name in the GFW dataset, where feasible.

CS_polygon <- sf::st_bbox(c(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
  crs = 4326) |>
  sf::st_as_sfc() |>
  sf::st_as_sf()

GFW_port_CS_fReg = read.csv(paste0(fd,"GFW_port_CS_fReg.csv"))

EMODNET_port_sf = read_sf(paste0(wd,"EMODnet_HA_MainPorts_Traffic_20241112/EMODnet_HA_MainPorts_Ports2025_20241112.shp")) %>% 
  filter(CNTR_CODE %in% Country_code) %>% 
  st_intersection(CS_polygon)


GFW_port_CS = GFW_port_CS_fReg 
                

GFW_port_sf = GFW_port_CS %>% st_as_sf(coords = c("lon","lat"), crs = st_crs(EMODNET_port_sf))

EMODNET_port_sf$PORT_NAME = toupper(EMODNET_port_sf$PORT_NAME)

Emo_port = EMODNET_port_sf$PORT_NAME
GFW_port = unique(GFW_port_sf$port)

Analysed the common port names between the Emodnet and GFW datasets and change name when is possible.

# Generate initial Venn diagram
v <- venn.diagram(
  list(Emo_port = Emo_port, GFW_port = GFW_port),
  fill = c("orange", "blue"),
  alpha = c(0.5, 0.5), 
  cat.cex = 1.0, 
  cex = 1.0,
  filename = NULL
)

# Inspect original labels (optional, can be commented out)
# lapply(v, function(i) i$label)

# Customize labels - these indices (5-7) may need adjustment based on your data
v[[5]]$label <- paste(setdiff(GFW_port, Emo_port), collapse = "\n")  # GFW_port only
v[[6]]$label <- paste(setdiff(Emo_port, GFW_port), collapse = "\n")  # Emo_port only
v[[7]]$label <- paste(intersect(Emo_port, GFW_port), collapse = "\n") # Intersection

# Render the plot
grid.newpage()
grid.draw(v)

GFW_port_sf = GFW_port_sf %>% 
               mutate(port = case_when(
                  port == "CALA GALERA" ~ "PORTO ERCOLE",
                  port == "GENOA" ~ "GENOVA",
                  port == "GIGLIO PORTO" ~ "ISOLA DEL GIGLIO",
                  port == "PORTO FERRAIO" ~ "PORTOFERRAIO",
                  TRUE ~ port  # Keep other values unchanged
                ))

GFW_port = unique(GFW_port_sf$port)

Portdiff = setdiff(Emo_port, GFW_port)



EMODNET_port_buffer_GFW = EMODNET_port_sf %>% 
                          filter(PORT_NAME %in% Portdiff) %>% 
                          st_buffer(dist = 3000) %>% 
                          st_intersection(GFW_port_sf)
                          
# EMODNET_port_buffer_GFW              
###inspect manually 


GFW_port_sf = GFW_port_sf %>% 
                mutate(port = case_when(
                      port == "ITA-110" ~ "RIO MARINA",
                      port == "ITA-167" ~ "CAVO",
                      TRUE ~ port ))


Emo_port = EMODNET_port_sf$PORT_NAME
GFW_port = unique(GFW_port_sf$port)
# Generate plot
v <- venn.diagram(list(Emo_port = Emo_port, GFW_port = GFW_port),
                  fill = c("orange", "blue"),
                  alpha = c(0.5, 0.5), cat.cex = 1.0, cex=1.0,
                  filename=NULL)

# lapply(v, function(i) i$label)


v[[5]]$label  <- paste(setdiff(GFW_port, Emo_port), collapse="\n")  
# in baa only
v[[6]]$label <- paste(setdiff(Emo_port, GFW_port)  , collapse="\n")  
# intesection
v[[7]]$label <- paste(intersect(Emo_port, GFW_port), collapse="\n")  

# plot  
grid.newpage()
grid.draw(v)

Finally, the GFW ports were double-checked for those without a valid port name, and the correct nearest port name was assigned to each one. See the leaflet map as an example. GFW ports without ID are in red, while correct ports are in blue. The blue pin represents the correct port. As illustrated, a comprehensive overview of all port name reassignments is provided.

GFW ports without ID are in red, while correct ports are in blue.

final_port_name = intersect(Emo_port, GFW_port)

xmin = as.numeric(st_bbox(FDI_effort_CS_sf_by_quarter)[1])-0.1
xmax = as.numeric(st_bbox(FDI_effort_CS_sf_by_quarter)[3])+0.1
ymin = as.numeric(st_bbox(FDI_effort_CS_sf_by_quarter)[2])-0.1
ymax = as.numeric(st_bbox(FDI_effort_CS_sf_by_quarter)[4])+0.1

  leaflet() %>%
  addTiles() %>%
  fitBounds(lng1 = xmin, lat1 = ymin, lng2 = xmax, lat2 = ymax) %>%
  
  # All GFW ports (red)
  addCircleMarkers(
    data = GFW_port_sf,
    radius = 5,
    color = "red",
    popup = ~ port
  ) %>%
  
  # Highlighted ports (blue)
  addCircleMarkers(
    data = GFW_port_sf[which(GFW_port_sf$port %in% final_port_name),],
    radius = 5,
    color = "blue",
    popup = ~ port
  ) %>%
  
  # Final ports (green)
  addMarkers(
    data = EMODNET_port_sf,
    popup = ~ PORT_NAME
  )
Pre-processed Port Name Resulting Port Name Rationale
CALA GALERA PORTO ERCOLE Name correction
GENOA GENOVA Name correction
GIGLIO PORTO ISOLA DEL GIGLIO Name correction
PORTO FERRAIO PORTOFERRAIO Name correction
ITA-110 RIO MARINA EMODNET port buffer
ITA-167 CAVO EMODNET port buffer
ITA-338 ANZIO Spatial correction
ITA-206 CIVITAVECCHIA Spatial correction
SANTA MARINELLA CIVITAVECCHIA Spatial correction
ITA-368 CIVITAVECCHIA Spatial correction
ITA-297 PORTO ERCOLE Spatial correction
ROMA FIUMICINO Name correction
ITA-115 PORTOFERRAIO Spatial correction
MARCIANA MARINA PORTOFERRAIO Spatial correction
MARINA DI SALIVOLI PIOMBINO Spatial correction
PORTOVENERE LA SPEZIA Spatial correction
LAVAGNA CHIAVARI Spatial correction
SESTRI LEVANTE CHIAVARI Spatial correction
CAMOGLI GENOVA Spatial correction
ITA-273 GENOVA Spatial correction
VARAZZE SAVONA Spatial correction
SAN LORENZO AL MARE IMPERIA Spatial correction
SANTO STEFANO IMPERIA Spatial correction
SAN REMO IMPERIA Name correction
SANREMO IMPERIA Spatial correction
  GFW_port_sf = GFW_port_sf %>% 
              mutate(port = case_when(
                      port == "ITA-338" ~ "ANZIO",
                      port == "ITA-206" ~ "CIVITAVECCHIA",
                      port == "SANTA MARINELLA" ~ "CIVITAVECCHIA",
                      port == "ITA-368" ~ "CIVITAVECCHIA",
                      port == "ITA-297" ~ "PORTO ERCOLE",
                      port == "ITA-115" ~ "PORTOFERRAIO",
                      port == "MARCIANA MARINA" ~ "PORTOFERRAIO",
                      port == "MARINA DI SALIVOLI" ~ "PIOMBINO",
                      port == "PORTOVENERE" ~ "LA SPEZIA",
                      port == "LAVAGNA" ~ "CHIAVARI",
                      port == "SESTRI LEVANTE" ~ "CHIAVARI",
                      port == "CAMOGLI" ~ "GENOVA",
                      port == "ITA-273" ~ "GENOVA",
                      port == "VARAZZE" ~ "SAVONA",
                      port == "SAN LORENZO AL MARE" ~ "IMPERIA",
                      port == "SANTO STEFANO" ~ "IMPERIA",
                      port == "SANREMO" ~ "IMPERIA",
                      TRUE ~ port )) 
                
  
  
  
  GFW_port = unique(GFW_port_sf$port)
  final_port_name = intersect(Emo_port, GFW_port)
  
  EMODNET_final_port = EMODNET_port_sf %>% select(PORT_NAME) %>% filter(PORT_NAME %in% final_port_name) %>% rename(port = PORT_NAME)

  
  GFW_port_sf = st_drop_geometry(GFW_port_sf) %>% 
                left_join(EMODNET_final_port) %>% 
                st_as_sf()
 
  GFW_port_sf %>% 
    mutate(n = 1) %>% 
    group_by(port) %>%
    summarise(nvessel = sum(n)) %>%
    ggplot()+
  geom_bar(aes(y = reorder(port,nvessel) , x = nvessel), stat = "identity")+
  # geom_vline(xintercept = 10, color = "red")+
  theme_light()+
  ggtitle("Harbour by number of vessels")+
  xlab("Proportion of number of vessels")+
  ylab("")

write_sf(GFW_port_sf, paste0(fd,"GFW_port_CS_fReg_coords.shp"))  
write.csv(data.frame(st_coordinates(GFW_port_sf), st_drop_geometry(GFW_port_sf)), row.names = F, paste0(fd,"GFW_port_CS_fReg_coords.csv"))

We have now obtained the number of vessels for the main ports, and we must link them to the effort. However, we are unable to retain individual boat information because some vessels are associated with many different ports. So we make last modification:

GFW_port_CS = read.csv(paste0(fd,"GFW_port_CS_fReg_coords.csv")) %>% mutate(MMSI = as.character(MMSI))

fleetReg_place_reg = read.csv(paste0(fd,"FleetReg_info_CS.csv")) %>%  
                    select(MMSI, Place.of.registration.name) %>% 
                    mutate(MMSI = as.character(MMSI))

GFW_port_fREG = GFW_port_CS %>% left_join(fleetReg_place_reg)
GFW_port_fREG <- GFW_port_fREG %>%
                  group_by(MMSI) %>%
                  mutate(
                    port_count = n_distinct(port),
                    match_port = if_else(port_count > 1, Place.of.registration.name, port),
                    final_port = if_else(is.na(match_port), port, match_port )
                  ) %>%
                  mutate(final_port = toupper(final_port)) %>% 
                  ungroup() %>%
                  select(-port_count)  

setdiff(unique(GFW_port_fREG$final_port), unique(GFW_port_fREG$port))
## [1] "SESTRI LEVANTE"          "SANTA MARGHERITA LIGURE"
## [3] "ROMA"                    "SAN REMO"
 GFW_port_fREG <- GFW_port_fREG %>%
          mutate(final_port = case_when(
                      final_port == "ROMA" ~ "FIUMICINO",
                      final_port == "SAN REMO" ~ "IMPERIA",
                      TRUE ~ final_port ))
 
 setdiff(unique(GFW_port_fREG$final_port), unique(GFW_port_fREG$port))  
## [1] "SESTRI LEVANTE"          "SANTA MARGHERITA LIGURE"
 GFW_port_fREG %>% 
   filter(final_port %in% c("SESTRI LEVANTE" , "SANTA MARGHERITA LIGURE")) %>% 
   distinct(vessel_name,final_port)
## # A tibble: 4 × 2
##   vessel_name  final_port             
##   <chr>        <chr>                  
## 1 JAZZ         SESTRI LEVANTE         
## 2 ARDITO       SANTA MARGHERITA LIGURE
## 3 SGIUSEPPE    SESTRI LEVANTE         
## 4 TERESA MADRE SANTA MARGHERITA LIGURE
  GFW_port_fREG <- GFW_port_fREG %>%
          mutate(final_port = case_when(
                      final_port == "SESTRI LEVANTE" & vessel_name == "JAZZ" ~ "CHIAVARI", 
                      final_port == "SANTA MARGHERITA LIGURE" & vessel_name == "ARDITO" ~ "CHIAVARI",
                      final_port == "SANTA MARGHERITA LIGURE" & vessel_name == "TERESA MADRE" ~ "CHIAVARI",
                      final_port == "SESTRI LEVANTE" &  vessel_name == "SGIUSEPPE" ~ "CHIAVARI",
                      TRUE ~ final_port ))
 
  setdiff(unique(GFW_port_fREG$final_port), unique(GFW_port_fREG$port)) 
## character(0)
  unique(GFW_port_fREG$final_port)
##  [1] "ANZIO"               "PORTO ERCOLE"        "PORTO SANTO STEFANO"
##  [4] "CIVITAVECCHIA"       "GENOVA"              "CHIAVARI"           
##  [7] "FIUMICINO"           "PIOMBINO"            "LIVORNO"            
## [10] "PORTOFERRAIO"        "SAVONA"              "IMPERIA"            
## [13] "PORTOFINO"           "LA SPEZIA"           "VIAREGGIO"
port_coords = unique(GFW_port_fREG[c("X","Y","port")]) %>% 
              rename(lon_port = X,
                     lat_port = Y)

GFW_port_fREG_CS = GFW_port_fREG %>% 
                    select(-c(X,Y,port,match_port)) %>% 
                    rename(port = final_port) %>% 
                    left_join(port_coords)
  

GFW_port_fREG_CS_sf = GFW_port_fREG_CS %>%
   select(port,lon_port,  lat_port) %>% 
   unique() %>% 
   st_as_sf(coords = c("lon_port", "lat_port"), crs = st_crs(GSA))

 ggplot() +
  geom_sf(data = world)+
  geom_sf(data = GFW_port_fREG_CS_sf)+
  geom_label_repel(
      data = GFW_port_fREG_CS %>% select(port,lon_port,  lat_port) %>% 
   unique() ,
      aes(x = lon_port, y = lat_port, label = port),
      size = 3,
      min.segment.length = 0
    )   +
  coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax))+
  theme_minimal()+
  ggtitle("Map of main port for the CS")+
  xlab("Longitude")+
  ylab("Latitude")

  write.csv(GFW_port_fREG_CS, paste0(fd,"GFW_port_CS_fReg_coords.csv"), row.names = F)   
  GFW_port_fREG_sf = st_as_sf(GFW_port_fREG_CS, coords = c("lon_port","lat_port"), crs = 4326)
  
  write_sf(GFW_port_fREG_sf, paste0(fd,"GFW_port_CS_fReg_coords.shp")) 

Find the number of vessels by each port

At this phase, the number of fishing vessels by each port is calculated for each type of fishing gear and length of vessel. –> Remove quarter: The data is aggregated by year, and the seasonal variation is removed because the AIS data in this case study does not have optimal resolution.

GFW_port_fREG = read.csv(paste0(fd,"GFW_port_CS_fReg_coords.csv")) %>% 
              select(MMSI, Gear, vlength, port, lon_port, lat_port) %>% 
                   mutate(MMSI = as.character(MMSI)) %>% 
                    unique()

GFW_effort_CS_sf = read_sf(paste0(fd,"GFW_effort_CS_sf_grid.shp")) %>% 
  filter(MMSI %in% unique(GFW_port_fREG$MMSI)) %>% 
  dplyr::select(MMSI, App_F_H, id) %>% 
  mutate(MMSI = as.character(MMSI))


GFW_effort_CS_sf =GFW_effort_CS_sf  %>% 
  left_join(GFW_port_fREG, by = "MMSI")

write_sf(GFW_effort_CS_sf, paste0(fd,"GFW_effort_port_by_icell.shp"))

Step 6 - Disaggregation process

EU DCF economic AER disaggregation by ICES cell

Compute the disaggregation of the economic data for each ICES cell. Here, we present the disaggregation of the Gross Value of Landing (GVL): The disaggregation of the GVL is carried out by distributing the aggregate value across ICES cells in proportion to the distribution of fishing effort. Specifically, the effort share is derived as the ratio of fishing days recorded within a given cell to the total fishing days corresponding to the relevant vessel length class. The resulting share is then applied to the overall GVL, yielding the value attributable to each cell. This approach ensures that the spatial allocation of economic value is systematically aligned with observed fishing activity, thereby providing a consistent and methodologically sound basis for economic analysis at a disaggregated spatial scale.

\[ GVL_{(g,l,icell)} = GVL_{(g,l)} \times Effort\ shared_{(g,l,icell)} \]

\[ Effort\ shared_{(g,l,icell)} = \frac{Fishing\ days_{(g,l,icell)}}{\sum Fishing\ days_{(g,l,icell)}} \] Where g is the gear type, l is the vessel length, and icell is the ices cell

AER = read.csv(paste0(fd,"Economic_data_wide.csv")) %>% rename(vssl_ln = vlength) %>% 
  dplyr::select(c("vssl_ln", "Fishing.days", "Energy.costs", "Gross.value.of.landings"))
  
FDI_effort_CS_sf = read_sf(paste0(fd,"FDI_effort_CS.shp"))

FDI_id = FDI_effort_CS_sf %>% 
          group_by(cscode, ger_typ, vssl_ln ) %>% 
          summarise(day = sum(ttfshdy)) %>% 
          rename(id = cscode)

FDI_vl = FDI_id %>% 
          group_by(ger_typ,vssl_ln) %>% 
          mutate(tot_day = sum(day)) %>% 
          st_drop_geometry() %>% 
          ungroup()

FDI_vl <- FDI_vl %>%
  group_by(ger_typ) %>% 
  mutate(effort_share = day / tot_day)

FDI_vl <- FDI_vl %>%
          left_join(AER, by = c("vssl_ln"))

FDI_vl <- FDI_vl %>%
  mutate(across(c(Fishing.days, Energy.costs, Gross.value.of.landings), ~ .x * effort_share, .names = "{.col}_AER")) %>% 
  dplyr::select(-c(Fishing.days,Energy.costs,Gross.value.of.landings))


write.csv(FDI_vl, paste0(fd,"FDI_AER_by_icell.csv"), row.names = F)

Add GFW data by gear and vessel length, and check differences between GFW effort data (expressed in hours of fishing) and FDI data (expressed in days of fishing) (Figure 16) to ensure data reliability.

GFW_effort_port_by_icell = read_sf(paste0(fd,"GFW_effort_port_by_icell.shp")) %>% 
  rename(vssl_ln = vlength)

FDI_AER_by_icell = read.csv(paste0(fd,"FDI_AER_by_icell.csv"))



GFW_id = GFW_effort_port_by_icell %>% 
          group_by(id, Gear, vssl_ln) %>%  
          summarise(h = sum(App_F_H)) %>% 
          st_drop_geometry()

FDI.AER_id = FDI_AER_by_icell %>% 
          group_by(id,  ger_typ, vssl_ln ) %>% 
          summarise(day = sum(Fishing.days_AER)) 

### check data effort of FDI and GFW

FDI_GFW = inner_join(GFW_id, FDI_id)                   


FDI_GFW_plot = ggplot(data = FDI_GFW, aes(x = day, y = h))+
  geom_point()+
  geom_smooth(method = "lm")+
  theme_bw()+
  xlab("FDI-AER - Fishing days \nby ices cell")+
  ylab("GFW - Fishing hours \nby ices cell")

FDI_GFW_plot

FDI effort - AER disaggregation by spatial ICES cells

Since there is a linear relationship between GFW effort and FDI fishing days, we can simply divide the FDI data by cell by the number of vessels in a given cell. The resulting map is indicative of spatial variations in the GVL within the designated case study area.

\[ GVL~by~vessel = \frac{GVL~by~cell_{(g,l,icell)}}{n.~vessel_{(g,l,icell)}} \]

Where g is the gear type, l is the vessel length, and icell is the ices cell.

gfw_df = GFW_effort_port_by_icell %>% 
          group_by(id, vssl_ln) %>% 
          mutate(n_track = n()) %>% 
          mutate(n_prop = n_track * sum(n_track)) %>% 
          ungroup()

gfw_fdi_merged_df <- gfw_df %>%
                      left_join(FDI_AER_by_icell, by = c("id", "vssl_ln"))

final_df <- gfw_fdi_merged_df %>%
    mutate(across(c(Fishing.days_AER, Energy.costs_AER, Gross.value.of.landings_AER), ~ .x / n_track, .names = "{.col}_by_vessel")) %>% 
  dplyr::select(-c(Fishing.days_AER,Energy.costs_AER,Gross.value.of.landings_AER))

Output

Map the Gross Value of Landings (GVL) resulting from the disaggregation process

map_GVL = ggplot()+
  geom_sf(data = final_df, aes(color = log(Gross.value.of.landings_AER_by_vessel)))+
  scale_color_viridis_c("log(GVL)",option = "D",  na.value = "white")+ 
  geom_sf(data = FDI_id, fill = "NA")+
  geom_sf(data = world)+
  coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax))+
  annotation_scale(location = "tl", width_hint = 0.5) +
  annotation_north_arrow(location = "tl", which_north = "true", 
                         pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"),
                         style = north_arrow_fancy_orienteering) +
  ggtitle(paste0("Disaggregate GVL data result - point"))+
  theme_light()

map_GVL

final_df_by_ices = FDI_id %>%
  select(geometry) %>% 
  st_join(final_df) %>% 
   ggplot +  
  geom_sf(aes(fill = log(Gross.value.of.landings_AER_by_vessel)))+
  geom_sf(data = world)+
  scale_fill_viridis_c("log(Gross Value of Landings)",option = "inferno",  na.value = "white")+ 
  coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax))+
  annotation_scale(location = "tl", width_hint = 0.5) +
  annotation_north_arrow(location = "tr", which_north = "true", 
                         pad_x = unit(0.1, "in"), pad_y = unit(0.1, "in"),
                         style = north_arrow_fancy_orienteering) +
  ggtitle(paste0("Disaggregate GVL data result - ices cells"))+
  theme_light()+
  theme(legend.position = "bottom")

final_df_by_ices

The same disaggregation procedure can be applied to each port. Since we now know the number of vessels in each port, we can disaggregate by port, gear type and vessel length.

\[ GVL\ by\ port_{(g,l,p)} = GVL_{(g,l)} \times weight_{(g,l,p)} \] \[ weight_{(g,l,p)} = \frac{n.vessel\ by\ port_{(g,l,p)}}{\sum n.vessel\ by\ port_{(g,l,p)}} \] Where g is the gear type, l is the vessel length, and p is the port.

n_port = final_df %>% 
         st_drop_geometry() %>% 
                select(MMSI, vssl_ln, port, Gear) %>% 
                unique() %>%
                group_by(Gear, vssl_ln, port) %>%
                summarise(n_vessel_port = n())


nvessel_weighted <- n_port %>%
                 group_by(Gear, port) %>% 
                 mutate(weight = n_vessel_port / sum(n_vessel_port)) 

GVL_by_vlength = FDI_AER_by_icell %>% 
                 group_by(ger_typ, vssl_ln) %>% 
                 summarise(GVL = mean(Gross.value.of.landings_AER)) %>%                  rename(Gear = ger_typ)  
                      
FDI_AER_nvessel <- GVL_by_vlength %>%
                         inner_join(nvessel_weighted, by = c("Gear", "vssl_ln"))

FDI_AER_nvessel <- FDI_AER_nvessel %>%
                          mutate(
                            GVL_by_port = GVL * weight)


print(
  FDI_AER_nvessel %>% 
    ggplot()+
    geom_bar(aes(y = reorder_within(port,GVL_by_port, vssl_ln), x = GVL_by_port, fill = vssl_ln), stat = "identity")+
    facet_wrap(~ vssl_ln, scales = "free_x")+
    scale_y_reordered() +
    theme_minimal()+
    ylab("")
)

Save data results

final_df = final_df %>% 
            select(MMSI,App_F_H, id,Gear,vssl_ln,port, lon_port, lat_port, geometry) %>% 
            rename(GFW_hours = App_F_H,
                   cscode = id,
                   vlength = vssl_ln)
            
write_sf(final_df, paste0(fd, "output_df_",Year_CS,".shp"))

2 Protocol - FDI landings (species kg and price) by port

The protocol is designed to disaggregate FDI landing data for a time-series.

Step 1 - Open and subset FDI Landing data

Open FDI landing data

Year_CS = c(2014:2022)

Total landings coverage for the case study area - resulting from FDI data

#Set parameter for the map

world <- ne_countries(scale = "medium", returnclass = "sf", continent = "europe")
world = st_transform(world, crs = st_crs(GSA))

xmin = as.numeric(st_bbox(GSA)[1])-0.0001
xmax = as.numeric(st_bbox(GSA)[3])+0.0001
ymin = as.numeric(st_bbox(GSA)[2])-0.0001
ymax = as.numeric(st_bbox(GSA)[4])+0.0001


#landing

for(g in 1 : length(Gear_CS)){

l_data = purrr::map(landing_sf, ~ .x %>%
           filter(gear_type == Gear_CS[g]))
 
plot_list <- list() 
 
 for(y in 1: length(Year_CS)){
l_plot = l_data[[y]] %>% 
        group_by(year,gear_type,sub_region,cscode) %>% 
        summarise(tot_kg = sum(totwghtlandg), tot_value = sum(totvallandg)) %>% 
       ggplot()+
         geom_sf(aes(fill = tot_value))+
         geom_sf(data = world)+
         coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax))+
         scale_fill_viridis_c(option = "D")+ 
         ggtitle(paste0("FDI EU Landings_",Gear_CS[g],"_",Year_CS[y]))+
         theme_light()

plot_list[[y]] <- l_plot
 }
 combined_plot <- wrap_plots(plotlist = plot_list, ncol = 3)
 print(combined_plot)
 
 }

Firstly, we decided to remove the first 3 years (2014-2015-2016) because there is a lack of data.

Save all the filtered data in a specific folder fd = “CaseStudy/Data/”

landing_sf =  landing_sf[!(names(landing_sf) %in% c("2014", "2015", "2016"))]

fd_price = "CaseStudy/Data_price/"

saveRDS(landing_sf, paste0(fd_price,"Landing_sf.RData"))

saveRDS(
  purrr::map(landing_sf, ~ .x %>% 
               st_drop_geometry() %>% 
              rename(id = cscode, gear = gear_type, vlength = vessel_length, tot_fish_weight = totwghtlandg,    tot_fish_value = totvallandg)),  
  paste0(fd,"landing_CS.rData"))

Step 2 - Filter and clean landing data

Following this, only species with values in all years of the time series were selected. An outlier check was then performed. The figure below represent an example of a time-series showing annual price trends for 29 marine species in the top panel. While the bottom panel represent a scatter plot for each species illustrating the relationship between annual catch (yearly_kg) and price for the same species. Red points reveal potential correlations between supply and economic value, offering insights into market actions and resource management.

landing_CS = readRDS(paste0(fd_price,"Landing_CS.RData"))

euro_species = purrr::map(
               landing_CS, ~ .x %>% 
               filter(tot_fish_weight != 0)  )

euro_species_df = do.call("rbind", euro_species)

species_all_years <- euro_species_df %>%
  group_by(gear,vlength,species) %>%  
  distinct(year, species) %>%
  summarise(n_years = n_distinct(year), .groups = "drop") %>%
  filter(n_years == length(unique(euro_species_df$year)))

euro_species_filtered <- euro_species_df %>%
                         inner_join(species_all_years, by = c("species", "vlength", "gear"))

# remove quarter

euro_species_filtered <- euro_species_filtered %>%  
                           group_by(year,vlength, gear, id,  species) %>% 
                           summarise(yearly_kg = mean(tot_fish_weight)*1000, yearly_value = mean(tot_fish_value))

In the time series check if there are some outlier.

df <- euro_species_filtered
df$year <- as.character(df$year)

vlent <- unique(df$vlength)

empty_cases <- list()


for (g in seq_along(Gear_CS)) {
  for (vl in seq_along(vlent)) {
    
    gear_filter <- Gear_CS[g]
    vlength_filter <- vlent[vl]
    
    q_data <- df %>%
      filter(gear == gear_filter, vlength == vlength_filter)
    
    if (nrow(q_data) == 0) next
    
    p_data <- q_data %>%
      group_by(year, species) %>%
      summarise(svalue = mean(yearly_value), .groups = "drop")
    
    zero_rows <- p_data %>% filter(svalue == 0)
    
    if (nrow(zero_rows) > 0) {
      for (i in seq_len(nrow(zero_rows))) {
        empty_cases[[length(empty_cases) + 1]] <- list(
          gear = gear_filter,
          vlength = vlength_filter,
          species = zero_rows$species[i],
          year = zero_rows$year[i]
        )
      }
    }

    p <- ggplot(p_data, aes(x = year, y = svalue, group = interaction(species))) +
      geom_point() +
      geom_line() +
      facet_wrap(~ species, scales = "free_y") +
      theme_minimal() +
      ggtitle(paste0("Species price time series for ", gear_filter, "_", vlength_filter)) +
      theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
    
    print(p)
    
    q <- ggplot(q_data) +
      geom_point(aes(x = yearly_kg, y = yearly_value, color = vlength)) +
      facet_wrap(~ species, scales = "free") +
      theme_minimal() +
      ggtitle(paste0("Species price scatter for ", gear_filter, "_", vlength_filter)) +
      theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
    
    print(q)
  }
}

empty_df <- do.call(rbind, lapply(empty_cases, as.data.frame))
print(empty_df)
##     gear vlength species year
## 1    OTB  VL1218     ARA 2017
## 2    OTB  VL1218     ARS 2017
## 3    OTB  VL1218     EDT 2017
## 4    OTB  VL1218     EOI 2017
## 5    OTB  VL1218     MTS 2017
## 6    OTB  VL1218     MUT 2017
## 7    OTB  VL1218     TGS 2017
## 8    OTB  VL1218     ARA 2018
## 9    OTB  VL1218     ARS 2018
## 10   OTB  VL1218     EDT 2018
## 11   OTB  VL1218     EOI 2018
## 12   OTB  VL1218     MTS 2018
## 13   OTB  VL1218     MUT 2018
## 14   OTB  VL1218     OCC 2018
## 15   OTB  VL1218     TGS 2018
## 16   OTB  VL1218     ARA 2019
## 17   OTB  VL1218     ARS 2019
## 18   OTB  VL1218     EDT 2019
## 19   OTB  VL1218     EOI 2019
## 20   OTB  VL1218     MTS 2019
## 21   OTB  VL1218     OCC 2019
## 22   OTB  VL1218     TGS 2019
## 23   OTB  VL1218     ARA 2020
## 24   OTB  VL1218     ARS 2020
## 25   OTB  VL1218     CTC 2020
## 26   OTB  VL1218     EDT 2020
## 27   OTB  VL1218     EOI 2020
## 28   OTB  VL1218     MTS 2020
## 29   OTB  VL1218     OCC 2020
## 30   OTB  VL1218     TGS 2020
## 31   OTB  VL1218     ARA 2021
## 32   OTB  VL1218     ARS 2021
## 33   OTB  VL1218     CTC 2021
## 34   OTB  VL1218     EDT 2021
## 35   OTB  VL1218     EOI 2021
## 36   OTB  VL1218     MTS 2021
## 37   OTB  VL1218     OCC 2021
## 38   OTB  VL1218     TGS 2021
## 39   OTB  VL1218     ARS 2022
## 40   OTB  VL1218     CTC 2022
## 41   OTB  VL1218     EDT 2022
## 42   OTB  VL1218     EOI 2022
## 43   OTB  VL1218     MTS 2022
## 44   OTB  VL1218     OCC 2022
## 45   OTB  VL1218     TGS 2022
## 46   OTB  VL1824     ANE 2017
## 47   OTB  VL1824     ARA 2017
## 48   OTB  VL1824     ARS 2017
## 49   OTB  VL1824     CTC 2017
## 50   OTB  VL1824     EDT 2017
## 51   OTB  VL1824     EOI 2017
## 52   OTB  VL1824     MTS 2017
## 53   OTB  VL1824     MUT 2017
## 54   OTB  VL1824     OCC 2017
## 55   OTB  VL1824     TGS 2017
## 56   OTB  VL1824     WHG 2017
## 57   OTB  VL1824     ANE 2018
## 58   OTB  VL1824     ARA 2018
## 59   OTB  VL1824     ARS 2018
## 60   OTB  VL1824     CTC 2018
## 61   OTB  VL1824     EDT 2018
## 62   OTB  VL1824     EOI 2018
## 63   OTB  VL1824     MTS 2018
## 64   OTB  VL1824     OCC 2018
## 65   OTB  VL1824     TGS 2018
## 66   OTB  VL1824     WHG 2018
## 67   OTB  VL1824     ANE 2019
## 68   OTB  VL1824     ARA 2019
## 69   OTB  VL1824     ARS 2019
## 70   OTB  VL1824     CTC 2019
## 71   OTB  VL1824     EDT 2019
## 72   OTB  VL1824     EOI 2019
## 73   OTB  VL1824     MTS 2019
## 74   OTB  VL1824     OCC 2019
## 75   OTB  VL1824     TGS 2019
## 76   OTB  VL1824     WHG 2019
## 77   OTB  VL1824     ANE 2020
## 78   OTB  VL1824     ARA 2020
## 79   OTB  VL1824     ARS 2020
## 80   OTB  VL1824     CTC 2020
## 81   OTB  VL1824     EDT 2020
## 82   OTB  VL1824     EOI 2020
## 83   OTB  VL1824     MTS 2020
## 84   OTB  VL1824     OCC 2020
## 85   OTB  VL1824     TGS 2020
## 86   OTB  VL1824     WHG 2020
## 87   OTB  VL1824     ANE 2021
## 88   OTB  VL1824     ARA 2021
## 89   OTB  VL1824     ARS 2021
## 90   OTB  VL1824     CTC 2021
## 91   OTB  VL1824     EDT 2021
## 92   OTB  VL1824     EOI 2021
## 93   OTB  VL1824     HOM 2021
## 94   OTB  VL1824     MTS 2021
## 95   OTB  VL1824     MUR 2021
## 96   OTB  VL1824     MUT 2021
## 97   OTB  VL1824     OCC 2021
## 98   OTB  VL1824     TGS 2021
## 99   OTB  VL1824     WHG 2021
## 100  OTB  VL1824     ANE 2022
## 101  OTB  VL1824     ARA 2022
## 102  OTB  VL1824     ARS 2022
## 103  OTB  VL1824     CTC 2022
## 104  OTB  VL1824     DPS 2022
## 105  OTB  VL1824     EDT 2022
## 106  OTB  VL1824     EOI 2022
## 107  OTB  VL1824     HKE 2022
## 108  OTB  VL1824     HOM 2022
## 109  OTB  VL1824     MTS 2022
## 110  OTB  VL1824     MUR 2022
## 111  OTB  VL1824     MUT 2022
## 112  OTB  VL1824     NEP 2022
## 113  OTB  VL1824     OCC 2022
## 114  OTB  VL1824     TGS 2022
## 115  OTB  VL1824     WHG 2022
## 116  OTB  VL2440     ANE 2017
## 117  OTB  VL2440     ARA 2017
## 118  OTB  VL2440     ARS 2017
## 119  OTB  VL2440     EOI 2017
## 120  OTB  VL2440     MAC 2017
## 121  OTB  VL2440     MTS 2017
## 122  OTB  VL2440     MUT 2017
## 123  OTB  VL2440     OCC 2017
## 124  OTB  VL2440     PIL 2017
## 125  OTB  VL2440     SPC 2017
## 126  OTB  VL2440     ARA 2018
## 127  OTB  VL2440     ARS 2018
## 128  OTB  VL2440     EOI 2018
## 129  OTB  VL2440     MTS 2018
## 130  OTB  VL2440     OCC 2018
## 131  OTB  VL2440     ARA 2019
## 132  OTB  VL2440     ARS 2019
## 133  OTB  VL2440     EOI 2019
## 134  OTB  VL2440     MTS 2019
## 135  OTB  VL2440     OCC 2019
## 136  OTB  VL2440     ARA 2020
## 137  OTB  VL2440     ARS 2020
## 138  OTB  VL2440     DPS 2020
## 139  OTB  VL2440     EOI 2020
## 140  OTB  VL2440     MTS 2020
## 141  OTB  VL2440     OCC 2020
## 142  OTB  VL2440     CTC 2021
## 143  OTB  VL2440     EOI 2021
## 144  OTB  VL2440     MTS 2021
## 145  OTB  VL2440     OCC 2021
## 146  OTB  VL2440     ARS 2022
## 147  OTB  VL2440     CTC 2022
## 148  OTB  VL2440     DPS 2022
## 149  OTB  VL2440     EOI 2022
## 150  OTB  VL2440     MTS 2022
## 151  OTB  VL2440     OCC 2022

Some species showed zero values across the entire time series for specific gear and vessel length combinations, so these cases were excluded.

empty_df = empty_df %>%
  distinct(gear, vlength, species, year) %>%
  group_by(gear, vlength, species) %>%
  summarise(n_zero_years = n(), .groups = "drop") %>% 
  filter(n_zero_years == length(unique(euro_species_filtered$year)))

remove_keys <- paste(empty_df$gear, empty_df$vlength, empty_df$species, sep = "_")

euro_species_clean <- euro_species_filtered %>%
                      filter(!paste(gear, vlength, species, sep = "_") %in% remove_keys)

For other species, data were missing in certain years. Nonetheless, a strong positive correlation was observed between value and kilograms. To handle the missing data, we estimated the linear relationship between value and kilograms for each species and used it to impute the missing points in the time series.

df_full <- euro_species_clean
df_model <- df_full[df_full$yearly_value != 0, ]

non_significant_models <- list()

gear_list <- unique(df_model$gear)
vlength_list <- unique(df_model$vlength)

for (g in seq_along(gear_list)) {
  for (vl in seq_along(vlength_list)) {
    
    gear_value <- gear_list[g]
    vlen_value <- vlength_list[vl]
    
    subset_model <- df_model[df_model$gear == gear_value & 
                             df_model$vlength == vlen_value, ]
    
    species_list <- unique(subset_model$species)
    
    for (s in seq_along(species_list)) {
      
      species_value <- species_list[s]
      
      subset_species_model <- subset_model[subset_model$species == species_value, ]
      
      subset_zero <- df_full$gear == gear_value & 
                     df_full$vlength == vlen_value & 
                     df_full$species == species_value & 
                     df_full$yearly_value == 0
      
      if (nrow(subset_species_model) >= 2) {
        
        model <- lm(yearly_value ~ yearly_kg, data = subset_species_model)
        model_summary <- summary(model)
        fstat <- model_summary$fstatistic
        r_squared <- model_summary$r.squared
        pval <- pf(fstat[1], fstat[2], fstat[3], lower.tail = FALSE)
        
        if (!is.na(pval) && pval < 0.05 && r_squared > 0.4 && any(subset_zero)) {
          # Previsione e sostituzione
          df_full[subset_zero, "yearly_value"] <- predict(model, newdata = df_full[subset_zero, ])
        } else {
          # Modello non significativo: salva
          non_significant_models[[length(non_significant_models) + 1]] <- data.frame(
            species = species_value,
            gear = gear_value,
            vlength = vlen_value,
            p_value = ifelse(is.na(pval), NA, round(pval, 5)),
            r_squared = round(r_squared, 3)
          )
        }
      }
    }
  }
}

if (length(non_significant_models) > 0) {
  non_significant_df <- do.call(rbind, non_significant_models)
  print(non_significant_df)
} else {
  message("All models were significant.")
}
##         species gear vlength p_value r_squared
## value       HKE  OTB  VL1218 0.00054     0.301
## value1      ILL  OTB  VL1218 0.00000     0.969
## value2      MNZ  OTB  VL1218 0.00000     0.986
## value3      OCC  OTB  VL1218      NA     1.000
## value4      OCT  OTB  VL1218 0.00000     0.980
## value5      RJC  OTB  VL1218 0.00000     0.974
## value6      SCS  OTB  VL1218 0.00000     0.976
## value7      SQZ  OTB  VL1218 0.00000     0.905
## value8      WHB  OTB  VL1218 0.00000     0.940
## value9      ARY  OTB  VL1218 0.00000     0.903
## value10     CTC  OTB  VL1218 0.13643     0.190
## value11     DAB  OTB  VL1218 0.00000     0.982
## value12     JOD  OTB  VL1218 0.00000     0.763
## value13     MUR  OTB  VL1218 0.59223     0.009
## value14     PAC  OTB  VL1218 0.00000     0.946
## value15     ROL  OTB  VL1218 0.17240     0.059
## value16     SBR  OTB  VL1218 0.00000     0.858
## value17     WEG  OTB  VL1218 0.00000     0.942
## value18     DPS  OTB  VL1824 0.28945     0.045
## value19     HKE  OTB  VL1824 0.24448     0.052
## value20     NEP  OTB  VL1824 0.50660     0.017
## value21     MUT  OTB  VL1824 0.52300     0.032
## value22     ARY  OTB  VL2440 0.00000     0.956
## value23     CTC  OTB  VL2440 0.29481     0.180
## value24     DPS  OTB  VL2440 0.43113     0.052
## value25     HKE  OTB  VL2440 0.78938     0.002
## value26     ILL  OTB  VL2440 0.00000     0.999
## value27     JOD  OTB  VL2440 0.00000     0.982
## value28     MNZ  OTB  VL2440 0.00000     0.990
## value29     MUR  OTB  VL2440 0.71690     0.004
## value30     NEP  OTB  VL2440 0.75014     0.002
## value31     OCT  OTB  VL2440 0.00000     0.993
## value32     PAC  OTB  VL2440 0.00000     0.953
## value33     SQZ  OTB  VL2440 0.00000     0.942
## value34     SBA  OTB  VL2440 0.00000     0.877
## value35     SYC  OTB  VL2440 0.00000     0.975
## value36     MUT  OTB  VL2440 0.74592     0.005
## value37     ARA  OTB  VL2440 0.68407     0.022

If the relationship between value and kilograms proved to be non-linear for a given gear–vessel length–species combination, missing values were instead estimated using a simplified linear regression that did not account for vessel length. In these cases, the regression was applied at the gear–species level to impute the missing data.

keys_nosig <- paste(non_significant_df$gear, non_significant_df$species, sep = "_")

df_full %>% 
  filter(paste(gear, species, sep = "_") %in% keys_nosig) %>% 
  ggplot()+
  geom_point(aes(x = yearly_kg, y = yearly_value))+
  facet_wrap(~gear+species, scales = "free")+
  theme_minimal()+
  ggtitle("Find correlation [Species-Gear] for no significant combination [Species-Vlength-Gear]")

keys_nosig <- paste(non_significant_df$gear, non_significant_df$vlength, non_significant_df$species, sep = "_")

df_model_gs <- df_full %>% 
  filter(paste(gear, vlength, species, sep = "_") %in% keys_nosig) %>% 
  filter(yearly_value != 0)

non_significant_models <- list()

gear_list <- unique(df_model_gs$gear)

for (g in seq_along(gear_list)) {
  
  gear_value <- gear_list[g]
  subset_model <- df_model_gs[df_model_gs$gear == gear_value, ]
  
  species_list <- unique(subset_model$species)
    
    for (s in seq_along(species_list)) {
      
      species_value <- species_list[s]
      subset_species_model <- subset_model[subset_model$species == species_value, ]
      
      if (nrow(subset_species_model) >= 2) {
        
        model <- lm(yearly_value ~ yearly_kg, data = subset_species_model)
        fstat <- summary(model)$fstatistic
        pval <- pf(fstat[1], fstat[2], fstat[3], lower.tail = FALSE)
        
        if (!is.na(pval) && pval <= 0.05) {
          
          rows_to_update <- which(
            paste(df_full$gear, df_full$vlength, df_full$species, sep = "_") %in% keys_nosig &
            df_full$gear == gear_value &
            df_full$species == species_value &
            df_full$yearly_value == 0
          )
          
          if (length(rows_to_update) > 0) {
            predicted <- predict(model, newdata = df_full[rows_to_update, ])
            df_full$yearly_value[rows_to_update] <- predicted
          }
          
        } else {
          non_significant_models[[length(non_significant_models) + 1]] <- data.frame(
            gear = gear_value,
            species = species_value,
            p_value = ifelse(is.na(pval), NA, round(pval, 5))
          )
        }
      }
    }
  }


if (length(non_significant_models) > 0) {
  non_significant_df <- do.call(rbind, non_significant_models)
  print(non_significant_df)
} else {
  message("All models were significant.")
}
##        gear species p_value
## value   OTB     HKE 0.37703
## value1  OTB     OCC      NA
## value2  OTB     CTC 0.50123
## value3  OTB     MUR 0.55016
## value4  OTB     ROL 0.17240
## value5  OTB     DPS 0.17565
## value6  OTB     NEP 0.45257
## value7  OTB     MUT 0.51800
## value8  OTB     ARA 0.68407
for (g in seq_along(Gear_CS)) {

  p_m = df_full %>% 
  filter(gear == Gear_CS[g]) %>% 
    filter(yearly_value > 0) %>% 
  ggplot()+
  geom_point(aes(x = yearly_kg, y = yearly_value, color = vlength))+
  facet_wrap(~vlength+species, scales = "free")+
  theme_minimal()+
  ggtitle("Positive value in the time series after cleaning")
  
  print(p_m)
}

Now we are ready for disaggregate data by landing port (Figure 23). First, we remove the remaining zero values, and we calculate the price as: \[ Price_{(g,l,s)} = \frac{value_{(g,l,s)} \; (euro)}{abundance_{(g,l,s)} \; (Kg)} \] Where g is the gear type, l is the vessel length, and s is the species.

Before disaggregation, we select only the first 20 species by landing abundance

df_full %>% 
   group_by(species, gear) %>%
   summarise(mvalue = mean(yearly_value), kg = (mean(yearly_kg))) %>%
  ggplot() +
  geom_bar(aes(x = kg, y = reorder_within(species, kg, gear), fill = gear), stat = "identity")+
  ylab("FAO species code")+
  facet_wrap(~ gear, scales = "free")+
  xlab("kg")+
  ggtitle("List of all landed species for the CS")+
  scale_y_reordered()+
  theme_minimal()
## `summarise()` has grouped output by 'species'. You can override using the
## `.groups` argument.

species_sel <- df_full %>%
  group_by(species, gear) %>%
  summarise(m_kg = mean(yearly_kg), .groups = "drop") %>%
  group_by(gear) %>%
  slice_max(m_kg, n = 20) %>%
  ungroup()  

 FDI_land_spe_filter <- df_full %>%
                    inner_join(species_sel, by = c("species", "gear")) %>% 
                    select(-m_kg) %>% 
                    mutate(price = yearly_value/yearly_kg)

Some results

FAO_sp = read.csv(paste0(wd,"ASFIS_sp_2025.csv")) %>% select("Alpha3_Code","Scientific_Name") %>% rename("species" = "Alpha3_Code")

FDI_land_spe_filter$year = as.character(FDI_land_spe_filter$year)

gear_filter = unique(FDI_land_spe_filter$gear)

for (g in seq_along(gear_filter)) {

  print(FDI_land_spe_filter %>% 
    filter(gear == gear_filter[g]) %>% 
  group_by(year, vlength, species) %>% 
  summarise(m_price = mean(price)) %>% 
  left_join(FAO_sp) %>%
  mutate(sp_id = paste0(Scientific_Name," (",species,")")) %>% 
  
        ggplot(aes(x = year, y = m_price, color = vlength,
                   group = interaction(species, vlength)
                   
                   )) +
        geom_line(size = 1.1) +
        geom_point() +
        facet_wrap(~ sp_id, scales = "free", ncol = 5) +
        theme_minimal() +
        ggtitle(paste0("Species price time series for ", gear_filter[g])) +
        ylab("Species price")+
        theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
)
  }
## `summarise()` has grouped output by 'year', 'vlength'. You can override using
## the `.groups` argument.
## Joining with `by = join_by(species)`
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

for (g in seq_along(gear_filter)) {
print(
  FDI_land_spe_filter %>%
  filter(gear == gear_filter[g]) %>% 
   group_by(vlength,species) %>%
   summarise(kg_gear = sum(yearly_kg)) %>%
  group_by(species) %>% 
  mutate(kg_tot = sum(kg_gear)) %>% 
  mutate(kg_prop = (kg_gear/kg_tot)*100) %>%
  left_join(FAO_sp) %>%
  mutate(sp_id = paste0(Scientific_Name," (",species,")")) %>% 
  ggplot() +
  geom_bar(aes(y = sp_id, x = kg_prop, fill = vlength),stat = "identity") +
  labs(title = "Kg of landing by vlength", y = "", x = "Kg prop") +
  theme_minimal()+
  theme(axis.text.y = element_text(face = "italic")))
}
## `summarise()` has grouped output by 'vlength'. You can override using the
## `.groups` argument.
## Joining with `by = join_by(species)`

Save data filtered

 saveRDS(FDI_land_spe_filter, paste0(fd_price,"FDI_land_spe_filter.RData"))

Step 3 - Calculate and add price by species

For each port, gear, and vessel length, the relative share number of vessel was then calculated by dividing the number of vessels in each category by the total number of vessels present in that port. This produced the weights used to distribute landing values and quantities across ports. \[ weight_{(g,l,p)} = \frac{n.vessel\ by\ port_{(g,l,p)}}{\sum n.vessel\ by\ port_{(g,l,p)}} \] Where g is the gear type, l is the vessel length, p is the port, and s is the species.

The previously calculated weights were then joined with the species price resulting dataset, allowing landings to be distributed across ports. Following this process, the weighted price by species were calculated.

\[ Price\ by\ port_{(g,l,p,s)} = Price_{(g,l,s)} \times weight_{(g,l,p)} \] Where g is the gear type, l is the vessel length, p is the port, and s is the species.

nvessel_port =  read_sf(paste0(fd,"output_df_2021.shp")) %>% 
                st_drop_geometry() %>% 
                select(MMSI, vlength, port, Gear) %>% 
                mutate(year = "2021") %>% 
                unique()
nvessel_port <- nvessel_port %>%
                group_by(year, Gear, vlength, port) %>%
                summarise(n_vessel_port = n())
## `summarise()` has grouped output by 'year', 'Gear', 'vlength'. You can override
## using the `.groups` argument.
nvessel_weighted <- nvessel_port %>%
                     group_by(year, Gear, port) %>% 
                     mutate(weight = n_vessel_port / sum(n_vessel_port)) %>% 
                     rename(gear = Gear)

FDI_land_spe_filter = readRDS(paste0(fd_price,"FDI_land_spe_filter.RData")) %>% 
                      group_by(year, gear, vlength, species) %>% 
                      summarise(mval = mean(yearly_value), mkg = mean(yearly_kg), mprice = mean(price))  
## `summarise()` has grouped output by 'year', 'gear', 'vlength'. You can override
## using the `.groups` argument.
FDI_land_spe_nvessel <- FDI_land_spe_filter %>%
                         inner_join(nvessel_weighted, by = c("year", "gear", "vlength"))
## Warning in inner_join(., nvessel_weighted, by = c("year", "gear", "vlength")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 149 of `x` matches multiple rows in `y`.
## ℹ Row 1 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
FDI_land_spe_nvessel <- FDI_land_spe_nvessel %>%
                          mutate(
                            mval_by_port = mval * weight,
                            mkg_by_port = mkg * weight,
                            mprice_by_port = mprice * weight)

df_port_species <- FDI_land_spe_nvessel %>%
                    group_by(year, gear, vlength, port, species) %>%
                    summarise(
                      mval = sum(mval_by_port, na.rm = TRUE),
                      mkg  = sum(mkg_by_port, na.rm = TRUE),
                      mprice = sum(mprice_by_port)) 
## `summarise()` has grouped output by 'year', 'gear', 'vlength', 'port'. You can
## override using the `.groups` argument.
df_port_species %>% 
  group_by(species, gear, vlength) %>% 
  summarise(mean_price = mean(mprice)) %>% 
  
ggplot( aes(y = reorder_within(species, mean_price, interaction(gear, vlength)), x = mean_price, fill = species)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~ gear + vlength, scales = "free", ncol = 4) +
  theme_minimal() +
  scale_y_reordered() +
  theme(legend.position = "none")+
  labs(title = "Average price by species and port", x = "Mean price (€)", y = "Species")
## `summarise()` has grouped output by 'species', 'gear'. You can override using
## the `.groups` argument.

Conclusion

Overall, the disaggregation protocol developed under WP2 provides a replicable and adaptable framework to enhance the spatial granularity and analytical potential of fisheries socio-economic data. While subject to certain limitations—chiefly related to data availability and resolution, the approach demonstrates the feasibility of integrating heterogeneous data sources (FDI, AER, GFW, and fleet register) to derive vessel-level and port-level indicators. This not only improves the precision of fisheries monitoring and assessment but also strengthens the foundational data layer required for advanced socio-ecological modeling within the Digital Twin of the Ocean. Future work will focus on validating this methodology across multiple regions and fleet segments, with the goal of operating it within broader simulation workflows and policy support tools.